Yuzuru Tanahashi, ytanahashi@ucdavis.edu [PRIMARY contact]
Yingcai Wu, ycwu@ucdavis.edu
Kwan-Liu Ma, ma@cs.ucdavis.edu
VIDI Research Group, University of California, Davis
For this mini challenge, we have implemented an interactive visualization tool using Netzen,
a visual network analysis system created by the VIDI Research Group at
the University of California, Davis.
In order to analyze the data efficiently,
line charts and stack graphs were added to the system.
These plotting features were selected due to their individual capabilities of analyzing the data.
The line charts were implemented with a function
that enabled interactive smoothing by approximating the trajectory of the lines.
The approximation was calculated by averaging each points with the neighboring values.
With this additional featue we
could easily transact from analyzing
approximated simplified data to actual concise data.
The stack graphs were implemented with a function
that enabled the users to highlight features in stacks
by changing the opacity.
The features to highlight, such as the width or the change in width, were also configurable by the users.
This additional function had a significan effect on
reducing the time for identifying the key symptoms of the disease.
It took us
about two weeks to implement these plotting features and enhancements
. The resulting tool allows
us to more easily characterize the spread of a disease.
Video:
[.mov]
(66.9MB)
This video demonstrates the intereactive exploration of our analysis.
ANSWERS:
MC2.1: Analyze the records you have
been given to characterize the spread of the disease. You should take into consideration symptoms
of the disease, mortality rates, temporal patterns of
the onset, peak and recovery of the disease.
Health officials hope that whatever tools are developed to analyze this
data might be available for the next epidemic outbreak. They are looking for visualization tools that
will save them analysis time so they can react quickly.
Due to the large quantity and the losses of the data
(i.e., Thailand, Turkey, and Venezuela have one day lacking in the data),
we have compiled the data into a daily percentile format.
After this compilation, we were left with 75 records of daily
data from April 16th to June 29th 2009 for each area.
In this percentile data, we have assigned each symptom two values:
HS (Hospitalize Symptom) and DP (Death Probability).
HS is the percentage of the patients with the subjected
symptom over the total number of patients.
DP is the mortality rate of the subjected symptom.
We did not find any significant results when we analyzed the
gender and age aspects of the data. Therefore, we did not consider
these in the rest of the study.
Figure 1 shows line charts of the mortality rates for each area.
From these charts, we can instantly notice that all areas, except
Thailand and Turkey, have a large rise in mortality rates.
From Figure 1, we can also see that the onset of the epidemic
was from around April 23rd (circled in red),
and reached its peak about May 15th (circled in blue),
three weeks from the outbreak. Depending on areas, the recovering took about
three to four weeks.
Next, in order to analyze the symptoms of the disease
, we created stack graphs of HS data,
shown in Figure 2.
By highlighting HS rates, we can instantly see
that the key symptoms of the disease were "ABDOMINAL PAIN", "DIARRHEA", "FEVER", and "VOMITING".
Although "BACK PAIN" also seems to be one of the
key symptoms, the stack width of it is almost the half of the others. Therefore, for simplicity we will concentrate on these four major symptoms.
By identifying the key symptoms, we can further analyze the disease by filtering out other unimportant symptoms. Figure 3 shows the stack graphs of DP data of the key symptoms.
![]() |
Figure 3. Stack graph of DP data of "ABDOMINAL PAIN", "DIARRHEA", "FEVER", and "VOMITING". The blue lines are indicators for May 30th. The red boxes are indicators of the onset of the epidemic. |
From the red boxes in Figure 3, we can see that in all areas, except Thailand and Turkey, it takes about three to five days for the DP values to reach their peak. We consider this indicates that the disease had about one to five days of incubating period. Also, from Figure 3, the DP values of the symptoms tend to keep stable during the epidemic around 12.0. This indicates that the patients hospitalized with the disease had about 88.0 percent of chance of survival.
MC2.2: Compare the outbreak
across cities. Factors to consider
include timing of outbreaks, numbers of people infected and recovery ability of
the individual cities. Identify any
anomalies you found.
By comparing the onset timings
from the graphs shown in Figure 1,
we can see that the epidemic
first started in Nairobi, then immediately
traveled into the Middle East (Aleppo, Karachi, Saudi Arabia, and so on)
and South America (Colombia, Venezuela).
The red boxes in Figure 3 also validates this observation.
Although most areas in Middle East were exposed to the epidemic,
it did not outbreak in Turkey.
Thailand also showed no sign of the epidemic throughout the period.
From Figure 4, although it is not possible
to see any temporal differences by simply plotting the actual
data on to a line chart, shown in (a),
by utilizing our smoothing function we can easily observe
that areas such as Lebanon, Venezuela, and Saudi Arabia
did not have as a significant growth in number
of patients as other areas such as Aleppo, Karachi, Nairobi, and Yemen.
Throughout the analysis, we do not find significant differences
between areas in their recovery ability.
However, in Figure 3 (blue vertical lines), we can observe that
the DP values of almost all areas start to decrease from May 30th.
This significant decrease, in our opinion, indicates that people might
have found a new way to treat the disease around that time.
In Figure 1, there are several spikes of high mortality
rates (circled in green).
These spikes can also be seen in Figure 2 and 3.
Due to the irregularity and its extraordinary high value,
we consider these spikes may emerge from forged data.
Figure 5. Stack graphs of DP data of all symptoms. In this graph,
the key symptoms of the disease
are not highlighted.
Figure 5 shows the stack graphs of the DP data for all symptoms.
From Figure 5, we can observe many symptoms' DP values rise during the period of the epidemic.
This indicates that the patients
that were hospitalized by other miscellaneous symptoms also risked a higher mortality rate than usual.
Not only that, but also we can observe
symptoms with the key word "VAGINAL" highlighted in many of these graphs.
By investigating the symptoms
such as "VAGINAL BLEEDING", we are convinced
that these symptoms have a good chance of being strongly associated with HIV patients.
Assuming this is true, Figure 5 also conveys the vulnerability of HIV
patients against the disease, and the population of them
within different areas.